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Abstract 

Rational Numbers is an essential topic in mathematics since it necessitates the learning 
progression of more advanced topics. Nevertheless, previous literature shows that students are 
having difficulties in understanding the topic for numerous reasons. The inability of teachers in 
providing good examples during teaching is identified as one of the major causes. Thus, this study 
aims to develop a calibrated pool of items to facilitate teachers in giving appropriate examples 
for the topic of Rational Numbers. We employed a descriptive design to provide descriptions of 
the item statistics for the calibrated pool of items. Samples of the study consisted of 1,292 
secondary school students. We used the Rasch measurement model framework via a quantitative 
approach to analyse the data. The results showed that all items demonstrated an acceptable 
quality of measuring students’ ability in rational numbers while at the same time demonstrated 
high evidence of validity and reliability as well. Ultimately, we also provided suggestions on how 
teachers can use the pool of items in delivering appropriate examples in the classroom. 
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INTRODUCTION 


Success in school and beyond is greatly influenced by 
mathematics proficiency (Ritchie & Bates, 2013). And, 
according to Tian and Siegler (2018), one of the prime 
factors that contribute to mathematics proficiency is 
knowledge about rational number. Rational number is 
defined as any number that can be expressed as a ratio 
of two integers with the denominator # 0 (Blinder, 2013). 
For example, 2 is a rational number since it is a product 
of 2/1 or 4/2 etc. Decimals such as 0.125 is also a rational 
number since it can be expressed in terms of 1/8. In 
general, since integers take the values of positive, zero, 
and negative numbers, rational numbers also have 
similar properties. Rational numbers and the concepts 
connected to them are essential for learning mathematics 
since the understanding of these concepts helps students 
to progress better in more advanced topics (Mozacco et 
al., 2013; Siegler et al., 2012). For example, since 
probabilities are widely expressed as fractions, decimal, 
and percentages, it requires an understanding of the 


magnitudes of these rational numbers to understand the 
concept of probabilities and therefore the decision- 
making contexts. 


The following examples might give a better insight 
into the importance of understanding the concept of 
rational numbers that can be further applied beyond the 
classroom. In inferential statistics, to be able to recognize 
the different meaning of p = .01, p = .10, p = .05, or p= 
.001 requires some level of understanding of decimals. 
Also, in engineering, fractions and decimals are always 
being used in the conversion of units, while ratios and 
proportions are also used in medical practice for 
calculating the right amount of dosage of medication. 
More than that, rational numbers also play an important 
part in our daily life. For instance, knowledge of 
fractions helps us to understand discounts for items on 
sale, while understanding decimals will surely 
encourage precisions. 
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LITERATURE REVIEW 


Despite its importance, previous literature shows that 
the topic of Rational Numbers is very challenging for 
students. Notably, there are different level of difficulties 
associated with the topic. One of the most resounding 
difficulty relates to the “whole number bias” 
phenomenon, in which the inappropriate application of 
natural number rules was used (Ni & Zhou, 2005). For 
example, Sun (2019) listed some difficulties in the 
operations of addition, subtraction, multiplication and 
division of fraction which resulted from the algorithm 
that was not supported by the whole number rule. 
Further, according to van Hoof et al. (2015), whole 
numbers differ from rational numbers in four distinct 
aspects, namely, (1) density, (2) representation, (3) 
number size, and (4) arithmetic operations. Applying 
whole number rules in these aspects may lead to 
systematic errors. Other than that, research by Yetim and 
Alkan (2013) identified basic mistakes such as failure to 
convert rational numbers into decimal numbers and vice 
versa and stating that -8/5 is equal to -8.5. Besides, there 
is the Longer-is-Larger rule, where numbers with more 
digits are commonly considered bigger (Liu et al., 2014). 
To illustrate, students who adopt the rule believe that 4.9 
is smaller than 4.34 since the latter has more digits. 


Apart from the “whole number bias” phenomenon, 
Sigler and Lortie-Forgues (2017) identified two other 
sources of difficulties encountered by the students, 
which they termed as inherent and culturally difficulties. 
Inherent sources of difficulty include difficulty in 
understanding individual rational number (such as why 
- is bigger than - when 3 is bigger than 2?), the 
relationship between rational and whole number (such 
as why =+ “ == and not = 2), as well as the relation 


among rational number (for example, why + “ = * but 


On the other hand, as the name suggests, a culturally 
contingent source of difficulty involves the culture 
within which the learners originated from. It is well 
acknowledged that teachers’ knowledge differs based on 
their countries of origin. For example, while Canadian 
pre-service teachers find it difficult in explaining the 
concept of multiplication using two fractions (Siegler & 
Lortie-Forgues, 2015), a large majority of their Chinese 
counterparts reported otherwise (Lin et al., 2013). 
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One of the possible explanations for the disparity was 
the conception of teacher professional development 
(TPD). While TPD in the Western countries often takes 
place in the form of workshops that are considered 
remote, inconsistent, and sometimes contradictory 
(Guskey, 2003), the culture of professional development 
in East Asian countries such as China can happen at any 
point of time in the teachers’ daily routine (Huang, 2006). 
Plus, there is also an Asian culture of learning from more 
experienced teachers that in turns improve the younger 
teachers’ knowledge and skills in teaching (Li et al., 
2006). 


Another identified source of difficulty is textbook 
content. Lack of coverage in the textbook may influence 
students’ exertion in understanding a_ particular 
mathematical concept. For example, Son and Senk (2010) 
found that the US textbook contained fewer examples of 
fraction division problems for the fifth and sixth graders 
compared to fraction multiplication problems. This may 
explain the poorer results among American students. 
Apart from that, language is also decidedly an important 
culturally contingent source of difficulties. To exemplify, 
numerical terms used in East Asian countries seem to 
facilitate students’ better achievement in mathematics 
(Dowker et al., 2008). 


Like other topics in mathematics, one of the 
important approaches to teaching rational numbers is by 
providing examples. The primary purpose of providing 
examples is to assist retention by repetition of the 
procedure so that students develop proficiency. 
Likewise, it is hoped that while working on examples, 
students can construct mew awareness and 
understanding with regards to both procedure and 
concept. There are many ways that teachers can give 
examples. Among the common approaches is by 
introducing an idea or explaining a concept. Several 
researchers have conducted studies to explore strategies 
used in providing examples. For instance, Bills and Bills 
(2005) suggest that teachers should use simple examples 
first, such as using small numbers and minimum 
operations and use examples that build on students’ 
prior knowledge to scaffold students’ learning. Teachers 
should also use examples that allow them to attend to 
common errors and misconceptions (Zodik & Zaslavsky, 
2008). 

Yet, studies also show that teachers struggle to do just 
that since they depended heavily on the examples and 
exercises from the textbook. Compounding this issue is 


the fact that the difficulty of the items in the textbook was 
not empirically tested. Also, there is an abundance of 
items in the textbook to choose from, making it a 
laborious task for the teachers to choose the best possible 
items to be used as classroom examples. Moreover, 
literature shows that teachers have been known as 
having poor ability to estimate the difficulty of a 
particular item (Impara & Plake, 1998; van de Watering 
& van der Rijt, 2006). Thus, teachers, especially the less- 
experienced, might find it challenging to find items from 
the textbooks that tailor to their teachings at a particular 
learning standard. 


Calibrated Item Pool 


One of the possible solutions to this is by having a 
pool of calibrated items for teachers to choose as 
examples in the classroom. Calibrated item pool is 
defined as a group of items that have been arranged 
according to their difficulty intensity. Thus, teachers 
might use easy items from the pool to introduce new 
concepts as well as reducing misconceptions. Gradually, 
more difficult items can be added to increase students’ 
understanding of a particular topic. According to the 
literature, a calibrated item pool can be used for a variety 
of purposes. One of the most important benefits is that it 
aids in constructing tests that are relevant to the testing 
objectives. To give an example, Aung and Lin (2020) 
established a calibrated pool of 164 mathematics items 
for Grade 6 children, and based on the statistics of the 
items in the bank; they were able to develop a 
psychometrically sound new 60-item test to evaluate the 
average ability students’ mathematical ability. 


The Classical Test Theory (CTT) and the Item 
Response Theory (IRT) are two widely used 
measurement theories for developing calibrated item 
pools (IRT). The IRT, on the other hand, is more 
commonly employed for item calibration. Many 
researchers choose the Rasch measuring framework 
within the IRT family because it requires less parameter 
estimate and is thus easier to deal with. For example, 
Bjorner et al. (2017) used the Rasch model to create a pool 
of high-quality items, with only five items from the pool 
having a very high concordance with the score based on 
all items. Kallinger et al. (2019) used the same model to 
calibrate an item bank of anxiety-related questions for 
orthopedic patients. The item bank serves as the 
foundation for a computer-adaptive exam that can be 
used to assess a wide variety of anxiety in orthopedic 
rehabilitation patients. Meanwhile, Nieto et al. (2017) 
used the adaptive power of a calibrated item pool to 
demonstrate that only one-third of the pool questions are 
sufficient to assess the Five-Factor Model personality 
facets accurately. 


Despite its potential, however, research on calibrated 
item pools in education is minimal. Hence, the purpose 
of the present study is to develop a calibrated item pool 
in the topic of rational numbers so that teachers can use 
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the items effectively as examples during classroom 
instructions. As a result, teachers are no longer required 
to estimate the difficulty of the items as classroom 
examples. Instead, teachers can continue to identify such 
items to use as examples based on their difficulty 
statistics. 


METHODOLOGY 


Participants 


Participants in this study consist of 1,292 secondary 
school students with an average age of 13 years old. The 
gender distribution is 590 males (45.7%) and 702 females 
(54.3%) from schools in the states of Kedah, Penang, and 
Perak in the northern parts of Malaysia. The selection of 
the schools was based on purposive sampling, in which 
the researchers identified schools with various degrees 
of achievement in mathematics. 


Instrument 


This study employed ten mathematics tests that were 
administered to ten schools. The tests were conducted 
separately but were linked together by several common 
items using the common item non-equivalent group 
design (Kolen & Brennan, 2014). Altogether, we 
employed 81 common items to link the ten tests and 362 
unique items measuring 13 topics specified in the 
curriculum specifications (Ministry of Education, 2016). 
However, only results involving the topic of Rational 
Numbers will be presented in this article. The tests were 
developed both by the researchers as well as by the 
practising teachers. Content validity of the test was 
observed by the head of the mathematics panel of each 
school. The tests included both multiple-choice and 
partial credit items. In the multiple-choice format, 
participants chose one correct answer from a list of four 
possible choices. One mark was given to the correct 
answer and no mark for the incorrect answer. In the 
partial credit format, the scoring was based on the 
completion of the steps in solving the problem. The 
marks for each item ranged from 1 to 4 marks, and the 
total marks for each test were 100. Correspondingly, 
items that shared the same stem in the partial credit 
format were treated as different items. Examples of a 
multiple-choice item and a 2-marks partial credit item 
are given in Table 1. 


Data Analysis 


The quality of each item in the item pool was 
examined by using Rasch model software WINSTEPS 
3.74. Apart from its simplicity, the model is favored to 
others in the IRT family, such as the 2-parameter model, 
since each item must have the same discriminatory 
power, allowing students to be estimated solely by item 
difficulty and not by how well they know the content 
being tested. Meanwhile, all forms of data are accepted 
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Table 1. Example of test items and its scoring 


Multiple-choice 


Item Which of the following is correct? 


A-2<-5 


Scoring DD sive 1 mark 


Partial Credit Item 
: 1/2 3 
Solve the following: 3 3* (= — =) 
(2 marks) 


3 ; x (- <) or equivalent...1 mark 


—1 : or equivalent ............ 1 mark 


when utilizing the 3-parameter model because the model 
adjusts for any disparities in the data. Nevertheless, we 
believe that erratic data that did not fit the model’s 
expectations for achievement tests will not be accepted 
for analysis. Similarly, guessing is also not accepted and 
is considered as reflecting the unreliability of the 
respondents. 


The plan of analysis started with assessing the 
assumptions of the Rasch model, specifically, (1) the 
model-data fit and (2) the unidimensionality 
assumptions. This is a crucial step since the Rasch model 
is considered as a model with a strict assumption that 
must be met to create the equal-interval scale (Bond & 
Fox, 2015). The first assumption was that the data must 
fit the model’s expectation. Model-data fit refers to the 
extent to which the data collected matches expectations 
from the model. This assumption was examined using 
the infit and outfit mean-square (MNSQ) values 
generated from WINSTEPS 3.74. While both statistics are 
sensitive towards unexpected responses, the infit MNSOQ 
deals with responses by the respondents that are 
targeted towards them while the outfit MNSQ explains 
far from the targeted respondents (Linacre, 2002). 
According to Bond and Fox (2015), the assumption is met 
when the values of the infit and outfit MNSQs were in 
the range from 0.6 to 1.4. Meanwhile, the 
unidimensionality assumes that items in a test measure 
a single construct (Wright & Masters, 1982). The 
assumption was examined from the principal 
component analysis of the residuals procedure in the 
software. The assumption is met when the variance 
explained by the measurement dimension from the 
procedure is more than 40% (Linacre, 2006). 


In this study, apart from examining the assumptions, 
we also reported statistics at the item level, specifically, 
the item reliability and item separation indices. Item 
reliability statistics refer to the ratio between true to 
observed item variance (Linacre, 2006). This provides 
information on the consistency of the ordering of item 
difficulty if an instrument is administered to a 
comparable sample of participants. High item reliability 
statistic indicates the consistent ordering of the items’ 
difficulty and vice versa. Meanwhile, the item separation 
index is an indication of the adequacy of the 
measurement to distinguish between participants. For 
example, if the separation index is 2, then it is possible to 
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distinguish the participant into two ability groups. It 
should be noted that a proper measurement should be 
able to distinguish clearly the ability of the participants. 
For a proper measurement, the item reliability index 
should be more than 0.94 (Fisher, 2007), while the 
separation index should not be less than 2.0 (Bond & Fox, 
2015). 

At the same time, statistics for each item were also 
reported. Apart from the item difficulty and the fit 
statistics, the point measure correlation (PTMEA) 
statistic was also included. The positive values of this 
statistic indicate that the particular item is working 
together with other items in the same direction to 
measure the intended construct (Bond & Fox, 2015). 


Apart from the abovementioned analysis, the present 
study also provided information regarding the learning 
standards for the topic of Rational Numbers. In the 
curriculum, learning standards are indicators of the 
quality of learning and achievement that can be 
measured (Ministry of Education, 2016). The analysis is 
essential to identify the most difficult-to-master learning 
standards so that teachers can benefit from the 
information when providing examples during the 
classroom. The topic of Rational Numbers consists of 21 
learning standards which are the most for any topic in 
the curriculum (Ministry of Education, 2016). The list of 
learning standards for this topic is presented in Table 2. 
Note that learning standards 1.2.4 (Describe the laws of 
arithmetic operations, which are Identity Law, 
Communicative Law, Associative Law, and Distributive 
Law) was not targeted by any item since it is supposed 
to be measured orally and not through test items. 
Meanwhile, learning standards 1.3.1 (Represent positive 
and negative fractions on number lines) was also not 
targeted by any items. This is because the knowledge 
and skills for the learning standards are similar to 
learning standards 1.4.1. 


FINDINGS 


In terms of a model-data fit, results from the 
calibration of all 447 items showed that the software 
dropped three items due to a lack of responses by the 
students. Meanwhile, 21 items that exhibited the infit 
and outfit MNSQ values outside the acceptable 0.6 -1.4 
guideline was manually deleted (see Table 3). 
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Table 2. Learning standards 


Learning Standards No. of Item 

1.1.1 Recognize positive and negative numbers based on real-life situations. 2 

1.1.2 Recognize and describe integers. 7 

1.1.3 Represent integers on number lines and make connections between the values and positions of the 4 
integers with respect to other integers on the number line. 

1.1.4 Compare and arrange integers in order. 8 

1.2.1. Add and subtract integers using number lines or other appropriate methods. Hence, make a 3 
generalization about the addition and subtraction of integers 

1.2.2 Multiply and divide integers using various methods. Hence make a generalization about the 1 
multiplication and division of integers. 

1.2.3. Perform computations involving combined basic arithmetic operations of integers by following the order 9 
of operations. 

1.2.4 Describe the laws of arithmetic operations, which are Identity Law, Communicative Law, Associative 0 


Law, and Distributive Law. 
1.2.6 Solve problems involving integers. 9 
1.3.1 Represent positive and negative fractions on number lines. 0 
1.3.2 Compare and arrange positive and negative fractions in order. 3 
1.3.3 Perform computations involving combined basic arithmetic operations of positive and negative fractions 4 
by following the order of operations 


1.3.4 Solve problems involving positive and negative fractions. 1 

1.4.1 Represent positive and negative decimals on number lines. 2 

1.4.2 Compare and arrange positive and negative decimals in order. 3 

1.4.3 Perform computations involving combined basic arithmetic operations of positive and negative decimals 5 

by following the order of operations. 

1.4.4 Solve problems involving positive and negative decimals. 2 

1.5.1 Recognize and describe rational numbers. 2 

1.5.2 Perform computations involving combined basic arithmetic operations of rational numbers by following 4 

the order of operations. 

1.5.3 Solve problems involving rational numbers. 2 

Table 3. Descriptive statistics 

Statistics Purpose Guidelines Empirical 

Infit and outfit MNSQ To ensure the empirical data matches the model’s 0.6-1.4 0.62-1.38 
specifications (Bond & Fox, 2015) 

Percentage of variance explained To examine whether the scale is measuring a unidimensional > 40% 54% 

in the 1st contrast construct. (Linacre, 2006) 

Item reliability Consistency of the ordering of item difficulty if an > 0.94 0.97 
instrument (Fisher, 2007) 

Item separation The adequacy of the measurement to distinguish between > 2.0 6.10 
participants (Bond & Fox, 2015) 


Conversely, results from the PCA of residuals showed reliability and item separation indices exceed the 
that raw variance explained by both the students and the intended values. 

items measures was 54%, which was more than the Table 4 showed statistics for all 71 items measuring 
intended value of 40% (Linacre, 2006). As such, we 17 learning standards. Two items were measuring the 
provided ample evidence that the unidimensionality 

assumption was also fulfilled. Besides, both item 


Table 4. Item statistics 


see No ofItem Format Item ae (in SE __Infit MNSQ hes PTMEA 
hel 2 MCQ HAI “3.18 0.59 1.03 1.15 0.30 
144 MCQ KA4 1,29 0.22 1.04 0.99 0.34 
112 7 PC KB1 “1.48 0.11 1.13 1.27 0.56 
11.2 PC LB4 “1.24 0.13 1.35 1.30 0.48 
112 MCQ L11R82 -0.69 0.17 0.98 0.95 0.24 
11.2 McQ TA2 -0.42 0.25 1.04 1.12 0.17 
4419 PC LIR7 -0.39 0.10 1.02 0.89 0.36 
11.2 PC SB1 -0.29 0.09 1.29 1.32 0.26 
11.2 PC TB3 1.63 0.18 1.03 1.01 0.33 


*MCO= multiple-choice question, PC= partial credit question 
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Table 4 (Continued). Item statistics 


Ena No of Item Format Item ee (im SE sInfit MNSQ a PTMEA 
143 4 PC LB7 “1.15 0.18 0.68 0.62 0.72 
113 PC LC14 -1.06 0.13 0.83 0.73 0.70 
173 PC LOR76 -0.90 0.15 1.04 0.90 0.21 
11:3 MCQ LA2 0.45 0.31 1.08 1.08 0.22 
114 8 MCQ HA2 “2.46 0.42 1.01 0.83 0.13 
114 MCQ KAI -2.46 0.29 0.80 0.51 0.50 
114 MCQ QA1 -1.88 0.20 0.90 0.83 0.41 
114 PC LAR30 “1.75 0.19 0.97 0.94 0.37 
114 MCQ RA2 -1.60 0.46 0.98 0.75 0.20 
114 MCQ LAI “141 0.28 0.98 0.96 0.38 
114 MCQ LA? -1.03 0.27 1.00 1.08 0.35 
114 PC QB1 0.07 0.16 0.87 0.85 0.48 
12.4 3 MCQ MA2 1.8 0.18 1.04 1.03 0.39 
134 PC Le2 -0.56 0.17 15 1.19 0.38 
141 PC HC5 0.37 0.07 1.00 1.05 0.42 
122 1 PC QCl “1.19 0.09 0.72 0.68 0.65 
1.23 9 MCQ MA3 3.67 0.26 1.00 0.73 0.34 
1,93 MCQ KA2 -2.84 0.33 1.02 1.05 0.22 
123 MCQ NA3 nie) 0.26 1.02 1.01 0.12 
123 MCQ L1R1 -0.90 0.17 1.00 1.06 0.16 
194 PC HB1 -0.57 0.08 0.91 0.92 0.46 
133 PC SC17 0.12 0.12 1.06 1.03 0.28 
1.28 PC KC2 0.46 0.14 1.34 1.27 0.38 
1.23 PC LB8 0.53 0.21 0.95 1.00 0.40 
123 PC Qc2 1.10 0.09 0.90 0.87 0.55 
12.6 9 MCQ SA2 “1.57 0.28 1.04 1.22 0.05 
1.2.6 MCQ HA4 “1.54 0.28 1.01 0.97 0.16 
1.2.6 MCQ LA4 “141 0.28 0.95 1.00 0.40 
1.2.6 PC Qc4 -0.46 0.08 123 1.37 0.43 
1.2.6 MCQ LA3 -0.44 0.27 0.86 0.81 0.52 
1.2.6 PC KC4 0.67 0.11 0.89 0.85 0.70 
1.2.6 PC Tel 0.87 0.09 112 142 0.46 
1.2.6 PC NC6 187 0.08 1.13 1.20 0.36 
12.6 MCQ NA2 1.60 0.18 0.99 0.99 0.24 
1ae 3 MCQ TA19 “1.74 0.40 1.00 0.80 0.19 
132 PC LC5 0.72 0.15 0.90 0.87 0.61 
152 MCOQ L11R81 0.83 0.13 0.96 0.93 0.34 
133 4 MCQ TAI “1.88 0.43 1.00 0.88 0.17 
13 PC Mc2 A395 0.09 1.15 1.38 0.68 
138 PC HC2 0.54 0.10 0.98 0.95 0.38 
133 PC Qc3 1.36 0.10 1.03 1.00 0.44 
134 1 PC L7R64 0.94 0.06 1.02 111 0.52 
141 1 PC LIR6 1.20 0.11 1.02 0.91 031 
141 PC L7R63 -0.06 0.10 1.07 1.09 0.27 
143 5 MCQ MA4 2.00 0.18 0.99 0.97 0.44 
143 MCQ NA1 -0.65 0.22 1.03 1.07 0.11 
143 MCQ L3R20 -0.63 0.16 0.94 0.86 0.49 
143 PC KG3 0.10 0.13 1.14 1.03 0.51 
143 PC L11R87 0.44 0.10 0.99 0.95 0.37 
144 2 MCQ NA4 0.61 0.21 0.95 0.90 0.30 
144 PC KC23 -0.08 0.10 ee 133 0.57 
151 2 PC MBI “1.81 0.08 0.80 0.64 0.80 
15.1 PC NCI 0.23 0.09 0.95 0.99 0.46 
15.2 4 PC LCI5 0.13 O11 111 0.81 0.55 
15.2 PC NC2 0.22 0.08 1.18 1.13 0.37 
15.2 PC HB5 0.74 0.06 0.98 0.91 0.53 
15.2 PC KC12 0.92 0.12 1.23 1.98 0.51 
153 3 PC LIIR86 -0.57 0.13 1.10 1.15 0.28 
153 PC MC3 0.23 0.11 1.08 0.86 0.62 
Total 71 


*MCO= multiple-choice question, PC= partial credit question 
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Item label Difficulty Level Item 

MA3 -3.67 Easy 468 + (6+3)x2 
A 26 C104 B 84 D 162 

HA1 -3.18 Easy Sarah is at 4 m below sea level. 

What is the appropriate integer to represent Sarah’s position? 
A4 Cx4 B-4 D+4 

KA2 -2.84 Easy 5 (6 —20) — (— 9) = 
A -76 B76 Cél D -61 

QB1 0.07 Moderate Arrange the following in descending order 
-4, -9, -7, 0, -5, 2 

KC3 0.10 Moderate — Solve 1.32 — (= =) + (-6.4) 

L7R63 -0.10 Moderate Complete the following number line. 

0.13 (4) 0.19 0.25 (4) 

TB3 1.63 Difficult The following shows a few numbers. State the prime numbers. 

NA2 1.60 Difficult A submarine is 190 m below sea level. The submarine descends at 12 m per minute for 
the first 5 minutes. Then for the following 10 minutes, the submarine ascends 15 m per 
minute. Find the final position of the submarine. 

A193 m C280 m B -100 m D -187 m 

L5R45 1.42 Difficult The following diagram shows cards with decimal numbers. 


1.125 — 3.905 [ 2514 | [ -1.008 | 008 [ 2.367 | 


Arrange the cards in J order. 


[2 marks] 


first learning standards 1.1.1, with both were in the form 
of multiple-choice questions (MCQ). The difficulty of 
items was estimated from the Winsteps software. Since 
the mean difficulty was set at 0, then the negative sign 
showed that respondents have more than a 50% chance 
of getting HA1 and KA4 correctly, with the latter was 
considered as more difficult based on its larger values. 
The SE indicated the standard error of the estimation. 
The infit and outfit MNSQ values of 1.03 and 1.15 signify 
that there were only 3% and 15% variation from the 
model’s expectations for the on-target and off-target 
participants. Finally, the positive value of 0.30 of the 
point-measure correlation (PTMEA) yielded evidence 
that Item HA1 was working together with other items in 
measuring the participants’ ability in Rational Numbers. 
In general, it seems that the teachers developed 
relatively easy items for this topic since the respondents 
have more than 50% of getting correct answer for 44 
(61.97%) items. Table 5 shows examples of easy, 
moderate, and difficult items. 


DISCUSSION 


From Table 5, it can be observed that the results for 
the easiest items were duly expected. Previous studies in 
Malaysia have shown that students have a high mastery 
level when answering items that measure procedural 
understanding, such as items MA3 and KA2. One 
possible explanation was that the ability to perform a 


series of computational tasks has always been exposed 
to the students since primary school (Rittle-Johnson et 
al., 2001). Therefore, the students were quite familiar 
with the types of items and had no problem solving 
them. 


Item HA1 was endorsed as one of the easiest-to-score 
since the item was very similar to the examples in the 
textbook. It is plausible that the teachers had gone 
through similar items with the students in the classroom. 
Materials from textbooks are always used as primary 
sources for teaching and learning activities, as 
demonstrated by Lepik et al. (2015). As a result, when 
asked again in these tests, students need to recollect the 
solution steps taught in class instead of engaging in 
high-level cognitive tasks like interpreting or evaluating. 
While there was a possible explanation for the easy-to- 
score items, the same could not be generalized to the 
difficult items. This is because, based on the explanation 
given by the teachers, these items were considered easy 
items since they are measuring low-level learning 
standards such as recognizing integers (item TB3) or 
arrange positive and negative decimals (item L5R45). 
Even though item NA2 requires students to solve a 
problem, it is considered a routine problem, and the 
students may have discussed it with their teachers 
during lessons. Since the teachers themselves did not 
have the explanations why these items were perceived 
as difficult, perhaps there is a need to retrace the 
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Circle the integers 


-2.4 -7 


wit 


Figure 1. Item KB1 (Difficulty measure = -1.48 logits) 


Table 6. Statistics for Items measuring Learning Standards 1.1.2 (Recognize and describe integers) 


Label Item 


Difficulty Measure 


LB4 Mark V for prime numbers 


27() 41() 55() 33() 43()~ 61() 


L11R82 
State the values of X and Y. 
17,19, X, 29, 31, Y, 41 
A 21,33 
B 21,37 
C 23,33 
D 23,37 


The first prime numbers that are more than 40 are 
A 43,45, 53 
B 43, 47,53 
C 41, 45, 47 
D 41, 43,47 


Mark V for prime numbers 
19() 25() 31 () 
Circle the integers 


2A 0a 7 - 0 59 0.25 -.51 


TA2 


LIR7 


39()  45() 490) 


SB1 


TB3 The following shows several numbers. 


State the prime numbers 


OsOROIORS 


37 () 


The following list showed prime numbers in ascending order. 


53 () 


-1.24 logits 


51() 70() 


-0.69 logits 


-0.42 logits 


-0.39 logits 
69 () 


-0.29 logits 


1.63 logits 


students’ responses and identify if there was a problem 
during the teaching and learning of these items. 


The calibrated pool of items developed from this 
study may help teachers in multiple ways. Firstly, it can 
be used to provide appropriate examples in the 
classroom. It is widely accepted that teachers should 
begin by providing easy examples to help students 
understand a concept before going progressively with 
more challenging ones (Bills & Bills, 2005; Rowland, 
2008). We believe that the effecting of instructional 
scaffolding like this can be best implemented using the 
calibrated pool of items. To illustrate, to teach learning 
standards 1.1.2 (Recognize and describe integers), 
teachers may use the item KB1 as the first example to 
convey the concept of integers (see Figure 1). This is 
because the item is the easiest, and it is conceivable that 
the students would be able to understand how they can 
come up with the answer. Teachers may start by 
explaining the definition of integers and ask the students 
whether -2.4 is an integer or not. Note that the answer is 
‘no’ because it is a decimal number and not a whole 
number. Teachers then should ask the students why it is 
not an integer. Next, the teachers may proceed with the 
subsequent number, ie., -7 and ask the students the 
same question again. After that, teachers can ask the 


students to identify whether =, 59, 0.25, and -61 are 
integers or not. We would expect that the students will 
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circle 59 and -61 and provide justifications. If somehow 
the students have difficulty in recognizing the integers, 
teachers may provide remedial activities at this early 
stage before students get more difficulties at the more 
advanced stage. 


After that, teachers may use more difficult items from 
the pool, such as item LB4 as examples to strengthen the 
students’ ability in recognizing and describing integers, 
particularly with regards to the prime numbers (see 
Table 6). Then teachers may ask the students to try 
answering the more difficult items L11R82 and TA2 on 
their own since not only both items involve prime 
numbers, but also possess a similar degree of difficulty. 
Finally, teachers might want to give items L1R7, SB1, and 
TB3 as part of homework together with other items from 
the textbook. Note that since TB3 was the most difficult 
items within this learning standard, then perhaps the 
teacher might want to revisit this item in the next class 
to see whether the students have difficulties in 
answering it. 


Even though the development of the pool of items 
was able to help teachers with instructional scaffolding 
for students by starting with easy examples and 
followed by more difficult ones, we believe that there is 
still a need to include more items. For example, we can 
see that only KB1 involves integers, decimals, and 
fraction, while other items in this learning standard 


measured students’ knowledge in prime numbers. 
Therefore, we believe there is a need to add more items 
that measure decimals and integer since these two 
concepts were often considered as challenging for the 
students (Barnett-Clark et al., 2010; Chval et al., 2013; 
Idris & Narayanan, 2011; Morge, 2011; Razak et al., 2011). 


Apart from facilitating teachers in providing 
examples, the pool of items developed in this study also 
has several other potentials. For example, at the end of 
the topic, teachers may use the pool of item for 
diagnostic purposes. That is, teachers can assemble some 
of the items to form a diagnostic test and then chart 
students’ performance for each learning standards. 
Using this approach, teachers can diagnose each 
students’ strengths and weaknesses with regards to the 
specified learning standards. Teachers can then plan a 
more focused intervention based on the diagnostic 
information. 


For students who demonstrated a high level of 
proficiency, the teachers can engage the students in 
enrichment activities such as disseminating more 
challenging items from the item pool. Also, teachers can 
develop different forms of test from the pool so that even 
though the students may not need to sit for the same test, 
their performance can still be compared. For instance, a 
teacher might choose ten items from the pool to create a 
short test on Rational Numbers to be administered to one 
particular class. The teacher then can select different 
items from similar difficulties to create another set of test 
to be administered to another class. Since all items were 
calibrated on a common scale, the performance of 
students in both classes can still be compared despite 
them answering different sets of test items (Holland & 
Dorans, 2006). This practice effectively helps maintain 
the security of the test items. 


CONCLUSION 


The current study described the process of 
developing a calibrated pool of items in the topic of 
Rational Numbers to facilitate teachers in giving 
examples in classroom learning. We were able to pool 71 
high-quality test items with varying degree of 
difficulties that were calibrated on a common scale. We 
were also able to provide evidence of validity and 
reliability of all items to measure the students’ ability in 
rational numbers. Subsequently, we presented 
indications that the difficulty measure of each item is 
highly reproducible when subsets of the pool items were 
administered to other groups of students. We also 
furnished some guidelines on how teachers could use 
the pool of items in giving examples as well as for other 
assessment purposes. 


Whilst several encouraging outcomes were 
demonstrated, the present study is bounded by few 
limitations. Firstly, this study delivers a_ strong 
assumption that the test items were administered to a 


EURASIA J Math Sci and Tech Ed 


relatively homogenous sample of students. We would 
have little knowledge about the results of the calibration 
if the samples were drawn from a heterogeneous 
population. Secondly, even with the large sample of 
items tested in this study, we believe that the items still 
represent a limited sample of stimuli for measuring 
students’ ability in rational numbers. As such, there is 
still a need to develop more items before the pool of 
items can maximize its capability to be used for 
formative purposes especially with regards to learning 
standards 1.1.1, 1.2.2, 1.3.4, 1.4.4, 1.5.1 since we believe 
each of the standards should be measured by at least 
three items. 
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